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Alzheimer's disease (AD) is a brain disease that significantly declines a 
person's ability to remember and behave normally. By applying several 
approaches to distinguish between various stages of AD, neuroimaging data 
has been used to extract different patterns associated with various phases of 
AD. However, because the brain patterns of older adults and those in 
different phases are similar, researchers have had difficulty classifying them. 
In this paper, the 50-layer residual neural network (ResNet) is modified by 
adding extra convolution layers to make the extracted features more diverse. 
Besides, the activation function (ReLU) was replaced with (Leaky ReLU) 
because ReLU takes the negative parts of its input, drops them to zero, and 
retains the positive parts. These negative inputs may contain useful feature 
information that could aid in the development of high-level discriminative 
features. Thus, Leaky ReLU was used instead of ReLU to prevent any 
potential loss of input information. In order to train the network from scratch 


without encountering the issue of overfitting, we added a dropout layer 
before the fully connected layer. The proposed method successfully 
classified the four stages of AD with an accuracy of 97.49 % and 98 % for 
precision, recall, and fl-score. 


This is an open access article under the CC BY-SA license. 


Corresponding Author: 


Abbas Hanon Hassin Alasadi 

Faculty of Computer Science and Information Technology, University of Basrah 
Basrah, Iraq 

Email: abbas. hassin @uobasrah.edu.iq 


1. INTRODUCTION 

Alzheimer's disease (AD) is a neurological disease. Its symptoms are a decline in memory, cognitive 
skill impairment, lack of reasoning and judgment, and difficulties with expression and comprehension [1]. 
The main reason for Alzheimer's disease is that the brains of patients have higher levels of amyloid [2], [3] 
and tau protein than the normal brains of older people [4], [5]. Accordingly, the increasing growth of amyloid 
plaques and tau neurofibrillary tangles leads to a loss of nerve cells’ ability to communicate with one another 
and, eventually, cell death. The first brain area to be negatively impacted by a disease is the hippocampus [6]. 
Due to the hippocampus's critical role in learning and memory, forgetting people and events is the first sign 
of AD. The number of people who have Alzheimer's disease Precipitate will quickly grow as life expectancy 
grows. According to the world Alzheimer's report [7] statistics, an estimated 50 million people had Alzheimer's 
disease in 2015. By 2050, this number is expected to increase to 131.5 million people worldwide. 

Presently, no treatment or drug can delay or stop the growth of Alzheimer's disease, necessitating 
and requiring effective and precise methods for early detection and preventing the disease from progressing 
to late stages. The practice of assigning items to a specific set based on their characteristics is called 
classification [8]. One of the specific goals of artificial intelligence research is to classify diseases [9]. A 
variety of neuroimaging methods are be of assistance in the classification of AD, for instance, magnetic 
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resonance imaging (MRI), computerized tomography (CT), functional MRI (fMRI), positron emission 
tomography (PET), magnetoencephalography (MEG), electroencephalography (EEG). MRI technologies are 
the most commonly used because images obtained by MRI show the affected cells darker than the healthy 
regions [10]. Figure 1 depicts several brain MRI images illustrating various stages of Alzheimer's disease. 
Dechter established the concept of deep learning (DL) in 1986 [11]. Its techniques, especially convolutional 
neural networks (CNN), have gained traction in a variety of fields, including image processing and analysis, 
and have achieved good performance in many computer vision tasks and medical imaging applications such 
as cerebral microbleeds (CMBs) detection [12]. 

This paper presents a modified version of the 50-layer residual neural network (ResNet) architecture 
that can solve the classification problem of four stages of Alzheimer's disease. Alzheimer's disease 
progressively deteriorates brain tissue in a predictable pattern. It reduces the size of the hippocampus and 
cerebral cortex of the brain while increasing the size of the ventricles [13]. Some outstanding research has 
been conducted on automated Alzheimer's disease diagnosis. The method of Jongkreangkrai et al. [14] 
consisted of two essential phases: the feature extraction phase, where they used MRI brain images to extract 
feature sets containing volumes of the hippocampus and amygdala as well as the thickness of the entorhinal 
cortex. After that, they moved on to the classification stage. They used the support vector machine algorithm 
to differentiate between Alzheimer's patients and healthy subjects based on their extracted features. Since the 
amount of medical data available is inadequate in comparison to other fields, as well as the fact that deep 
neural network training necessitates a large number of computing resources. Transfer learning is a promising 
alternative methodology developed by researchers. Jain et al. [15] used a pre-trained VGG16 network on the 
ImageNet dataset as a feature extractor for classifying brain MRI images into three categories: CN, MCI, 
MD, and AD as shown in Figure 1(a) to 1(d). Odusami et al. [16] introduced a new brain slice classification 
approach based on the ResNet18 algorithm. Puente-Castro et al. [10] used the first 47 layers of ResNet to 
extract features from sagittal MRI images. They then added the patient's age and gender to the feature vectors 
extracted by ResNet. 


(a) (b) (c) (d) 


Figure 1. Brain MRI images for various stages of Alzheimer's disease: (a) cognitively normal (CN), 
(b) mild demented (MCI), (c) moderate demented (MD), and (d) very mild demented (AD) 


The support vector machine (SVM) technique is then used as a classifier on these extracted feature 
vectors to determine whether the patient is in any stage of Alzheimer's disease or CN. Rabeh et al. [17] 
proposed an application for the early detection of Alzheimer's disease. The application framework has two 
steps: segmentation using region of interest (ROI) to isolate three critical regions: hippocampus, corpus 
callosum, and cortex. Following that, there is a classification step for components by using SVM and 
applying a decision tree to make the final decision. Ji et al. [18] developed a method for early diagnosis of 
Alzheimer's disease using MRI images of human brains and ResNet50, NASNet and MobileNet as base 
classifiers, which were trained in an end-to-end process 


2. THE PROPOSED METHOD 

Accuracy in medical diagnosis is more important than anything else, even more, important than 
speed of diagnosis. After all, the wrong diagnosis of an ordinary person as a patient causes severe 
consequences and psychological pressure, as well as diagnosing a patient, as usual, leads to the development 
of the disease because the wrong diagnosis, in this case, will delay treatment. Therefore, building an 
automated medical diagnostic system must be at a high level of accuracy. 


2.1. Convolutional neural networks (CNN) 

Convolutional neural networks are used in various applications, including image classification, 
segmentation, and pattern recognition [19], [20]. Due to its autonomous nature has developed into a critical 
tool for machine vision and artificial intelligence. CNN is a particular neural network that directly applies 
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image processing to the pixels without prior treatments [21]. The CNN architecture consists of three primary 
layers: the convolution layer, the pooling layer, and the fully connected layer. The convolution layer is the 
building block of the CNN algorithm. It is in charge of extracting the essential and beneficial features from 
the input images using a set of trainable filters, forming a feature map [22]. Between successive 
convolutions is a pooling layer used to reduce the feature mapping dimensions in computational space [23], 
thereby lowering the computational cost of the subsequent convolution layer, which contributes to the 
acceleration of training and the enhancement of generalization ability [24]. Typically, fully connected layers 
are inserted at the end of the CNN structure, utilized for recognition and classification. Every node in the 
fully connected layers is linked with trainable weights in adjacent fully connected layers. The fully 
connected layer produces a final output equal to the number of classes. CNN is a stacked version of all the 
layers that make up the CNN architecture. Each CNN, with a few exceptions, uses the same architecture. 

Fully training a new CNN from scratch is not without its challenges. Firstly, CNN needs large 
amounts of labeled data for the training process that may be difficult to obtain, especially in medical 
imaging. In addition, to train a CNN, need to put in lots of computing and memory resources. Otherwise, the 
training process would take a long time without these resources. Tuning hyperparameters is time-consuming 
and complicated, and it can result in overfitting or underfitting, which leads to poor model performance. 
Researchers have demonstrated a promising alternative method known as transfer learning to overcome 
these obstacles. Transfer learning means improving the learning of a new task through transferring 
knowledge from a previously learned task [25]. The fundamental goal of this research is to compare the 
performance of pre-trained ResNet50 against the modified residual network in detecting and automatically 
classifying Alzheimer's disease using MRI scans. 


2.2. Proposed framework 

This section presents the general architecture of the proposed framework. The proposed framework 
consists of the following stages: data collection, data preparation, model selection, training stage to build the 
model that helps in diagnosis, validation, and evaluation. Each stage is independent of the other and is 
responsible for implementing a specific function. At the same time, these stages can communicate with each 
other since the result of one stage will be the input to the different stages. Figure 2 describes the proposed 
framework. 


C start) Data Collection Stage 


Model Selection Stage 
CNN Network Selection (ResNetF) 
Hyperparameter Tuning 


Validation Stage on validation set 


Improvement 
invalidation 
accuracy 
YE 
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Figure 2. Flowchart of the proposed framework for early diagnosis of Alzheimer’s disease 
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2.2.1. Data preparation 

Data collection is the first step in the machine learning pipeline for training the selected model. The 
accuracy of ML systems’ predictions is only good when the data used to train them is good. So, the first 
stage in the framework of the proposed work in this work is to collect data and obtain it from data sources 
related to this work in order to solve the research problem and evaluate the results. In this study, the 
Alzheimer's brain MRI dataset was obtained from the open access of the Kaggle website. The dataset 
contains 6400 images with a size of 176x208 pixels. It has four classes (NonDemented, MildDemented, 
ModerateDemented, VeryMildDemented) with a non-uniform distribution of the images per class. 

The second step is changing size of image to (125x125) in order to decrease the time of training the 
neural network by lowering the number of pixels in an image because the more pixels in an image, the more 
input nodes there are, which raises the model's complexity. After that and because of the non-uniform 
distribution of images in each class, we generated new training examples using one of the data augmentation 
techniques only on the training set to improve deep neural network generalization capabilities and prevent 
overfitting. Horizontal flipping is the augmentation technique used in this step. This technique works by 
shifting all of the pixels in an image in the horizontal direction, or other words, by reversing the entire rows 
and columns of image pixels horizontally. 

The data set is separated into two independent sets; the training set which has (5,121) images and 
the testing set which has (1,279) images. The training set was then partitioned into into a 20% validation 
dataset and an 80% new training dataset. The proposed model is trained using the “new training set”. In 
contrast, the validation set periodically evaluates the model's performance during the training phase to avoid 
overfitting problems. The testing set is later used to evaluate how well the model generalizes to unseen data. 
Shuffling is the last step in data preparation. It regulates the weights, which means getting lower weights 
closer to zero. The most critical aspect is preventing the model from learning the training order. This step 
eventually helps the training converge quickly so that the network can provide better generalizations. 
During the validation phase and testing phase, there is no shuffling process for the model's parameters. 
During the validation and testing phases, we calculate accuracy and loss. Their calculation method is not 
sensitive to the order of samples, so shuffling does not affect the testing and validation data. 


2.2.2. Network architecture 

The residual neural network, known as ResNet, is a deep neural network that uses shortcuts, called 
"skip connections", to jump over some layers [26], [27]. ResNet has demonstrated outstanding performance 
in computer vision, so that it will be used in current research. He et al. [28] invented the ResNet in 2016, 
and it earned first place in the ILSVRC 2015 classification competition with a 3.57 error rate. 

ResNet's hypothesis is that deeper networks are more difficult to optimize, as the deeper model 
should be capable of performing as well as the shallower model by copying the shallower model's learned 
parameters and setting additional layers for identity mapping. To aid in the optimization of deeper models, 
residual blocks are designed to fit a residual mapping F (x) rather than the desired underlying mapping H (x) 
to assist in the optimization of deeper models, and entire ResNet architectures are built by stacking residual 
blocks. Figure 3 illustrates the concept of a residual block. If we assume that the input is x, the convolution 
layer's output is F (x), which is added to x as the mapping input, and the resulting output H (x) = F (x) + x is 
passed to the next layer. This is significantly easier than matching an identity map through a collection of 
nonlinear layers, and it does not require the network to include additional parameters and calculations. 
Simultaneously, it can significantly increase the training speed and effectiveness of the model as the number 
of layers increases. This residual block structure can effectively solve the problem of gradient vanishing in 
deep networks [28]. There are two types of residual blocks in ResNet. While the first type is suitable for 
training shallow networks, the second type (bottleneck) is recommended for more than 50 layers. 
Additionally, the two types share a similar level of time complexity. 

In this paper, an improved residual neural network based on ResNet50 (named ResNetF) has 
proposed. The number of convolution layers increased to 58 layers. More generally, as the number of 
network layers increased, the features that are extracted from the different layers become more diverse and 
richer. Additionally, the more deeply embedded the network is, the more abstract the features that are 
extracted. As a result of improving the network's feature extraction abilities, its effectiveness in AD 
diagnosis has improved. To ensure that overfitting is effectively avoided. We added a dropout layer before 
the fully connected layer in this architecture and set the dropout ratio to 50%. In ResNet50 ReLU is 
commonly used as an activation function. Basically, in CNN, ReLU takes the negative parts of its input and 
drops them to zero, and retains the positive parts. However, these negative inputs may contain useful feature 
information that could be used to aid in the development of discriminative high-level features [29]. 
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If a neurons output is 0, its gradient will never update the neuron's weight, resulting in the neuron 
never being activated. When the network contains a high number of inactive neurons, the convergence of 
the model becomes extremely difficult. Accordingly, this is referred to as the “dying ReLU” problem. This 
may prevent the network from learning and result in underperformance. To address this problem, Leaky 
ReLU is used instead of ReLU to prevent any potential loss of input information. Leaky Rectified Linear 
Activation (LReLU) has added an alpha parameter to the semi-axis of ReLU, resulting in a small gradient 
but not zero. Nodes that were previously inactive with ReLU will now have their weight-adjusted. Figure 4 


shows our enhanced network. ReLU is given by (1), (2): 


fou) =max(ox) =f ={F Fiz 5 
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Leaky ReLU is defined as (3), (4): 
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where ai is a predefined parameter that falls in the range of (0,1). It is usually 0.01, xi is the activation 
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Figure 4. Modified residual neural network (ResNetF) structure 
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2.2.3 Training stage 

In this stage, we used training set and the backpropagation algorithm to train our proposed network 
(ResNetF). We also used Adam as the optimizer with a learning rate of le-5, cross-entropy as the loss 
function. After each epoch during the training phase we assessed the model's performance by computing the 
validation accuracy and validation loss using our validation set. 


2.2.4. Evaluation Stage 

In this stage, popular criteria are used to assess the output of machine learning models to measure 
the classification model's ability to classify the testing dataset accurately. Accuracy, precision, recall, and 
Fl-score are the common measurement tools, as mentioned in (5) to (8), respectively. True positive, false 
positive, and false negative predictions for a given class label are represented by the variables TP, FP, and 
FN, respectively [30], [31]. 


TP+TN 


Accuracy T 
os TP 
Precision = r 
TP+FP 
TP 
Recall = a 
TP+FN 
F1l—score = 2 x Precision x Recall ti 


Precision + Recall 


3. RESULTS AND DISCUSSION 

This section explains the results of the research and, at the same time, gives a comprehensive 
discussion. The proposed network was trained for 13 hours, 54 minutes, 53 seconds. The Pytorch framework 
and Python 3.7.10 were used to run experiments on an Nvidia Tesla P100 GPU with 25 GB of memory. 

To evaluate the classification model using testing data, various criteria are used, including precision, 
recall, the fl-measure, and accuracy to assess all stages of Alzheimer's disease. According to the results of 
our experiment, the training accuracy was 99%, and the validation accuracy was 97%. The results of the 
evaluation are summarized in Table 1. 

The proposed model has been compared to a previous study [32] that addressed a similar issue and 
used the same data samples. We noticed that the model achieves encouraging results and surpasses the 
previous state-of-the-art in all criteria. The success can be attributed to expanding the number of 
convolutional layers, adding a dropout layer, and substituting an activation function (Relu) for an activated 
function (leaky Rule) in our modified network. 


Table 1. Performance of the ResNetF model 


Class Model Accuracy Precision Recall _ Fl-score 

Non Demented Yildirim and Cinar [32] 90 90 96.42 93.09 
ResNetF 96.56 99 97 98 

Mild Demented Yildirim and Cinar [32] 96.6 96.6 90.62 93.51 
ResNetF 96.64 99 97 98 
Moderate Demented Yildirim and Cinar [32] 70 70 70 70 
ResNetF 100 100 100 100 
Very Mild Demented Yildirim and Cinar [32] 90 90 90 90 
ResNetF 99 95 99 97 


4. CONCLUSION 

An accurate diagnosis of Alzheimer's disease allows the patient to receive the most appropriate 
treatment. This challenging task focuses on many researchers, who have built up many computer-aided 
diagnosis (CAD) systems to diagnose AD. This paper presented an enhanced residual neural network to 
classify four stages of Alzheimer's disease. By increasing the number of convolution layers, the network can 
effectively capture as many AD biomarkers as possible. In addition, substituting Relu for the activation 
function (Leaky Relu) can solve losing valuable features that could assist in the construction of high-level 
discriminative features. To avoid overfitting, we added a dropout layer before the fully connected layers to 
train all our architecture layers from scratch. The findings of the experiments demonstrate that the enhanced 
residual neural network is suitable for Alzheimer's disease diagnosis. 
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